Model Selection

Mixture of Experts architecture

# Mixture of Experts architecture

Qwen3 0.6B GGUF

Qwen3 is the latest version of the Tongyi Qianwen series of large language models, offering a range of dense and Mixture of Experts (MoE) models. Based on large-scale training, Qwen3 has achieved breakthrough progress in reasoning capabilities, instruction following, agent functionalities, and multilingual support.

Large Language Model English

Qwen3 128k 30B A3B NEO MAX Imatrix Gguf

GGUF quantized version based on Qwen3-30B-A3B Mixture of Experts model, extended to 128k context, optimized with NEO Imatrix quantization technology, supporting multilingual and multitask processing.

Large Language Model Supports Multiple Languages

Granite 4.0 Tiny Base Preview

Granite-4.0-Tiny-Base-Preview is a 7-billion parameter Mixture of Experts (MoE) language model developed by IBM, featuring a 128k token context window and enhanced expressive capabilities through Mamba-2 technology.

Large Language Model

Qwen3 30B A3B GGUF

Qwen3 is the latest large language model series developed by Alibaba Cloud, supporting dynamic switching between thinking mode and non-thinking mode, and excelling in reasoning, multilingual support, and intelligent agent capabilities.

Large Language Model English

Qwen3 0.6B Base

Qwen3-0.6B-Base is the latest generation of large language models in the Tongyi Qianwen series, offering a range of dense models and Mixture of Experts (MoE) models.

Large Language Model

Qwen3 30B A3B GGUF

A large language model developed by Qwen, supporting a context length of 131,072 tokens, excelling in creative writing, role-playing, and multi-turn conversations.

Large Language Model

lmstudio-community

Qwen3 235B A22B GGUF

Quantized version of the 235 billion parameter large language model released by the Qwen team, supporting 131k context length and Mixture of Experts architecture

Large Language Model

lmstudio-community

TimeMoE is a billion-scale time series foundation model based on the Mixture of Experts (MoE) architecture, focusing on time series forecasting tasks.

Materials Science

Tanuki 8x8B Dpo V1.0

Tanuki-8x8B is a large-scale language model pretrained from scratch, optimized for dialogue tasks through SFT and DPO

Large Language Model

Transformers Supports Multiple Languages

Norwai Mixtral 8x7B Instruct

A large Norwegian language model fine-tuned on instructions based on NorwAI-Mixtral-8x7B, optimized using approximately 9000 high-quality Norwegian instructions.

Large Language Model

The large language model of the Tongyi Qianwen Qwen2 series, which includes models with multiple parameter scales, ranging from 500 million to 72 billion parameters, and supports instruction tuning.

Large Language Model

Hkcode Solar Youtube Merged

A Korean language model further pretrained on SOLAR-10.7B, developed by the Fintech Department of Korea Polytechnics

Large Language Model

Transformers Korean

Karakuri Lm 8x7b Chat V0.1

A Mixture of Experts (MoE) model developed by KARAKURI, supporting English and Japanese dialogue, fine-tuned based on Swallow-MX-8x7b-NVE-v0.1

Large Language Model

Transformers Supports Multiple Languages

Jambatypus V0.1

A large language model fine-tuned with QLoRA on the Open-Platypus-Chat dataset based on Jamba-v0.1, supporting conversational tasks

Large Language Model

Transformers English

MGM-7B is an open-source multimodal chatbot trained on Vicuna-7B-v1.5, supporting high-definition image understanding, reasoning, and generation.

Mixtral Chat 7b

This is a hybrid model created by merging multiple Mistral-7B variant models using the mergekit tool, focusing on text generation tasks.

Large Language Model English

Swallow MX 8x7b NVE V0.1

Swallow-MX-8x7b-NVE-v0.1 is a Mixture of Experts model based on Mixtral-8x7B-Instruct-v0.1 with continued pretraining, primarily enhancing Japanese capabilities.

Large Language Model

Transformers Supports Multiple Languages

Mixtral 8x7B Holodeck V1 GGUF

A GGUF format model fine-tuned based on Mixtral 8x7B, specifically designed for Koboldcpp, with training data including approximately 3000 multi-genre e-books

Large Language Model English

Orthogonal 2x7B V2 Base

orthogonal-2x7B-v2-base is a Mixture of Experts model based on Mistral-7B-Instruct-v0.2 and SanjiWatsuki/Kunoichi-DPO-v2-7B, specializing in text generation tasks.

Large Language Model

Air Striker Mixtral 8x7B Instruct ZLoss 3.75bpw H6 Exl2

An experimental model fine-tuned based on Mixtral-8x7B-v0.1 with merged capabilities, supporting 8K context length and using ChatML prompt format

Large Language Model

Transformers English

Sauerkrautlm Mixtral 8x7B GGUF

SauerkrautLM Mixtral 8X7B is a multilingual text generation model based on the Mixtral architecture. It has been fine-tuned and aligned using SFT and DPO, and supports English, German, French, Italian, and Spanish.

Large Language Model

Transformers Supports Multiple Languages

Nllb Moe 54b 4bit

NLLB-MoE is a Mixture of Experts machine translation model developed by Meta, supporting 200 languages, and is one of the most advanced open-access machine translation models available.

Machine Translation

Transformers Supports Multiple Languages

KnutJaegersberg

A Mixture of Experts (MoE) model trained on masked language modeling tasks, with a parameter scale of 1.6 trillion. It uses an architecture similar to T5 but replaces the feed - forward layer with a sparse MLP layer.

Large Language Model

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase